About us

Hi!
We are Aviv, Kesem and Nitzan, 2nd year of master’s degree in social cognitive psychology.
Here we present our final project at Information Visualization course at Ben-Gurion University of the Negev, semester a 2022.

I.D.s:
Aviv Mokady: 203970249
Kesem Adi: 302585823
Nitzan Zilberman: 315604710

About The data

About The data

Our data was retrieved from link, link, link. It describes the grades in the Bagrut exams across high schools in Israel.
We had four different data frames, one for 2013-2016, and for each year 2017, 2018, 2019.
Each row is an observations of an exam grade, while each column represents a characteristic of the school or the exam.

  • 2013 – 2016
    Contain eight variables: Grades, Number of students, Number of Yahal (number of study unites), Graduation year, Subject, School name, School ID.
    The data consists of observations for 69,638 exams in 315 cities, 976 schools and 118 subjects.

  • 2017
    Contain eleven variables: eight variables similar to the 2013-2016 dataset and three additional variables: supervision, sector, and district.
    The data consists of observations for 14,184 exams in 306 cities, 1,063 schools and 110 subjects.

  • 2018
    Contain eleven variables, the same as in 2017.
    The data consists of observations for 14,975 exams in 311 cities, 1,020 schools and 109 subjects.

  • 2019
    Contain eleven variables, the same as in 2017 and 2018.
    The data consists of observations for 15,192 exams in 320 cities, 1,060 schools and 108 subjects.

  • Combining
    The Final combined dataset consists of observations for 113,989 exams in 336 cities, 1,174 schools and 130 subjects.

Coordination
We knew we wanted to present some of the information on a map, to do so we searched for the coordinates of the school and the cities in our data.
The cities were taken from here and the schools taken from here. We then combined this information with our data to create our final and complete dataframe for this project.

City school_id students yahal grad_year subject school_name grades district sector supervision school_x school_y city_x city_y
אבו גוש 148080 25 5 2018 מדעי לימודי הסביבה מקיף אבו גוש 74.48 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870
אבו גוש 148080 14 4 2014 מתמטיקה מקיף אבו גוש 73.79 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870
אבו גוש 148080 34 5 2015 מדעי לימודי הסביבה מקיף אבו גוש 76.79 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870
אבו גוש 148080 13 5 2015 כימיה מקיף אבו גוש 75.15 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870
אבו גוש 148080 26 5 2016 מדעי לימודי הסביבה מקיף אבו גוש 76.96 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870
אבו גוש 148080 20 4 2015 מתמטיקה מקיף אבו גוש 68.65 ירושלים ערבי כללי 35.10649 31.81075 3908447 3737870

What’s been done

Haaretz newspaper reported the Bagrut grades for 2010-2014. You can find the original article here.

City - Subject

In the article, they show an interactive graph in which you can select the city and the subject and it will present the trend of the grades along the years. The graph alows you to compare different cities and choose a specific year to receive more information about.

In general this visualization is an example of a well done visualization, but it could be improved. We think instead of having the years as a separate choice above the graph, this could have been done by making the points along the line chart interactive such that clicking on them will change the left information window.

Big cities - Subject

Another graph from the same Haaretz article allows to select a subject and a year and shows you the grades in the big cities.

IN contrast to the first visualization, we think this one is not as good. It looks nice, but where are the grades? The grades do not appear on the graph and the dashed lines not indicative.
Moreover, why were those cities were chosen? Tel-Aviv is one of the biggest but it’s not here. Additionally, when choosing different subjects or years, the ploted citeis change with no good explanation to how they are being chosen.
We also recommend to highlight the overall mean grades, right now you can’t compare to th big cities to the general population.

‘Give Five’- Bennet’s program to increase 5 Yahal in Math

Globes’s article (link) wanted to test whether Bennet’s program to double the amount of students who are in 5 Yahal in Math worked as he claims. To do so they ploted this graph:

As Globes shows, Bennet’s program is not necessarily the cause in the increase.
The program measured success according to the absolute number of students. Clearly they were not present at the ‘What Not To Do’ part of our course, because if they were they would know you should test improvement with percentages, as we will show in our first visualization.

Cleaning

District, Supervision, and sector

  • The data of 2013-1016 did not include these variables. We started by adding them using the school ID variable from the rest of our datasets.
  • We turned ‘Haredi’ district into ‘Haredi’ sector.
  • We turned ‘Dati’ supervision into ‘Dati’ sector.
  • we turned ‘Jerusalem Director of Education’ district into its geographic district (Jerusalem).
  • ‘Settlement education’ district was replace with the school’s geographic district.

We remained with six geographic districts (South, center, Tel-Aviv, Jerusalem, Haifa, North), six sectors (Haredi, Dati, Jewish, Badawi, Arab, Druz) and three supervisions (General, Dati, Independent).

Coordinates

  • We excluded Cities without coordinates.
  • We excluded schools without coordinates.

General cleaning

  • We removed rows with Missing values.
  • We removed exams (rows) with less than 11 students. No datasets included grades for exams under 11 students because of privacy issues, in some datasets exams with over 5 students were included but without the grade.

Our Vis

5 Yahal

  • Why:
    The ‘Give Five’ program was a major program for the Ministry of Education led by P.M. Bennet. According to the Ministry and Bennet, the program worked well, we want to verify this claim.

  • What:
    A line chart visualizing the trend in number and precentage of students in 5 Yahal in Math and in English divided by sectors.

  • How:
    Using a line chart we examine the change in number and percentage of students in 5 Yahal in Math. Additionally, we examine whether the change is similar in different sectors, or whether some sectors were affected more than others. Finally, we compare the change in math with the change in English to see if the program is the cause of the change.

Here is the link for the interactive app.


We see that similar to Globes’s finding, the increase in number of students in 5 Yahal has began before the program started (before 2015). Furthermore, a similar increase is observed in English 5 Yahal students, where no program was employed.

Map

  • Why:
    There are claims of inequality between the center and big cities of the country and the periphery, in jobs, salaries, health systems and education. This data and project gives us an opportunity to test this claim on the aspect of education through Bagrut grades and geographical information. We will examine if there is a visible difference between central cities and periphery areas in math grades in 2019.

  • What:
    a map of Israel and color the point of each school by the average math grade. to simplify visibility we will divide the data to four groups-

    • schools who’s average is 1.5 standard deviations above the country average.
    • schools who’s average is 0.5 standard deviations above the country average.
    • schools who’s average is 0.5 standard deviations below the country average.
    • schools who’s average is 1.5 standard deviations below the country average.
  • How:
    The map with scater plot on it shows the geographical positions of the school, by coloring them in positive and negative (green and red) colors, we could see areas which are colored more positive (i.e., above average areas) and areas which are colored more negative (i.e., below average areas).


The map shows that central areas (i.e., Gush-Dan and Jerusalem areas) are scattered with more green dots, whereas the periphery is scattered with more red dots, thus validating the claims of inequality in education, for math grade the least.

Top-Bottom Citizenship grades

  • Why:
    Ezrahut (Citizenship) is a subject that is learned the same in all sectors although it might be problematic for some sectors more than others. Does this effect the abilities of some sectors while giving an advantage to others?

  • What:
    A bar plot comparing the best and worst schools by Ezrahut grades from 2019 (the latest year in our data). Comparison will be done by deviation from country average. The bars will be colored by sector.

  • How:
    If all the best 5 schools are colored differently than worst 5, this will indicate of a major difference between sectors. on the contrast, if both the best and the worst are multicolored, it will indicate equality.


We see no differences between sectors, they appear the same within the best schools and the worst schools. Equality to all ;-)

Sector/District Per-unit

  • Why:
    We wanted to see whether there are trends in the Bagrut grades along the years. Also, we wanted to compare those trends across different sectors and districts.

  • What:
    The graph demonstrates the mean grades over the years in Math or English along the different Yahal (the two most indicative subjects because they are learned similarly in all sectors with no special advantage to any sector). The trends are presented for each sector or district.

  • How:
    With line charts we can identify trends along the years, and by coloring them differently for each sector or district we can compare them.


Here is the link for the interactive app.


In general, it seems that in most cases Jewish sectors (e.g., Jewish, Religious and Haredi) and central geographic districts, have higher mean grades. The worst grades are presented in south district and Bedouin sector, which are partly correlated.
In Math, we see an increase in the average grades, from 2013 to 2015 and from then a decrease is presented.
In English, for 4-5 Yahal there is an overall decrease, whereas at 3 Yahal the trends are moderately positive.

First language - English

  • Why:
    We thought it will be interesting to see if there is a general correlation between the abilities in language subjects by examining student’s grades in thier first language and their grades in English (a second language).

  • What:
    We created a scatter plot to present the correlation between a student’s first language and their grades in English at the different Yahals.
    On the X axis we present the first language (Hebrew for Jewish and Arabic for Arabs) and each sector (Jewish and Arab) is colored differently.

  • How:
    A scatter plot is the best way to demonstrate a correlation between variables. This will allow us to compare the correlations of Jewish and Arab sectors between thier first language abilities and their performances in English.

Here is the link for the interactive app.


We see an overall weak correlations between these subject. Yet, the correlations in the Jewish sector are stronger, especially in 5 Yahal.

Hebrew - History/Literature correlation

  • Why:
    Continuing with searching for correlations, we also wanted to see if there are connections between the linguistic subjects for the Jewish sector. That is, if there will be a correlation between abilities in Hebrew and abilities in History or Literature, two subjects that are based on being good in Hebrew.
    We also wanted to see if those correlations are consistent over the years.

  • What:
    We created a scatter plot to present the correlation between the grades in Hebrew and the grades in either History or Literature. A selection between the years is also possible, in order to compare the correlations over the years.

  • How:
    Once again, a scatter plot is the best way to demonstrate a correlation between variables. This allows us to compare the correlations of Hebrew-History and Hebrew-Literature over the years.

Here is the link for the interactive app.


The correlation between Hebrew and History is stronger than the correlation between Hebrew and Literature at all of the years, but the gap is getting smaller at 2018 and 2019.

Subject correlation

  • Why:
    We wanted to see whether there are, in general, subject who correlates with each other (besides those we already saw).

  • What:
    We created a heatmap of the correlation between all the core subjects (chosen by subjects that are thought in all schools).
    For subjects that are taught differently in different sectors we created one combined subject. For example, History is taught differently in different sectors, History in the heatmap represents a combination of Jewish history for Jewish schools, Arab history for Arabic schools and Druze history for Druze schools.
    We present the average grade across the years.

  • How:
    A heatmap can represent the correlations differences in the most interpetable way. The more the square is colored in strong blue, the higher the correlation. The more the square is colored in strong yellow, the lower the correlation.

*No negative correlations were found


We can see quite clearly that English and Sport are the most non-correlated subject and have the weakest connections to the other subjects, since the colors there seem lighter than at the others. Surprising exceptions can be seen at sport-Math and Sports-First language. Math and First language seem to have high correlations with all of the subjects besides English. The low correlations between First language and english is not surprising and match what we saw in another visualization even when separating to different first languages.

Our Code

# setup ------------------------------------------------------------------------

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, comment = '',
                      message = FALSE, paged.print = FALSE, 
                      dpi = 300, fig.width = 8, fig.height = 5, 
                      fig.align = 'center')

if(!require(rmdformats)) install.packages("rmdformats",repos = "http://cran.us.r-project.org")
## Loading required package: rmdformats
if(!require(tidyverse)) install.packages("tidyverse",repos = "http://cran.us.r-project.org")
if(!require(rstatix)) install.packages("rstatix",repos = "http://cran.us.r-project.org")
if(!require(ggplot2)) install.packages("ggplot2",repos = "http://cran.us.r-project.org")
if(!require(plotly)) install.packages("plotly",repos = "http://cran.us.r-project.org")
if(!require(ggpubr)) install.packages("ggpubr",repos = "http://cran.us.r-project.org")
if(!require(tidylog)) install.packages("tidylog",repos = "http://cran.us.r-project.org")
if(!require(shiny)) install.packages("shiny",repos = "http://cran.us.r-project.org")
if(!require(rgdal)) install.packages("rgdal",repos = "http://cran.us.r-project.org")
if(!require(sp)) install.packages("sp",repos = "http://cran.us.r-project.org")
if(!require(ggcorrplot)) install.packages("ggcorrplot",repos = "http://cran.us.r-project.org")

library(tidyverse)
library(rstatix)
library(ggplot2)
library(plotly)
library(ggpubr)
library(tidylog)
library(shiny)
library(rgdal)
library(sp)
library(ggcorrplot)


dat <- read.csv("df_csv.csv", header = TRUE, encoding = "UTF-8")
dat$subject <- as.character(trimws(dat$subject, which = c("both")))
dat$district <- as.character(trimws(dat$district, which = c("both")))
dat$sector <- as.character(trimws(dat$sector, which = c("both")))
dat$supervision <- as.character(trimws(dat$supervision, which = c("both")))
colnames(dat)[1] <- "City"


##### Number of students in 5 yahal math or english ---------------------------

dat1 <- dat
dat1$subject[dat1$subject == "מתמטיקה"] <- "Math"
dat1$subject[dat1$subject == "אנגלית"] <- "English"
x <- which(dat1$subject=="Math")
y <- which(dat1$subject=="English")
c <- c(x,y)
dat1 <- dat1[c,]
dat1$subject <- as.factor(dat1$subject)

dat1 <- dat1 %>%
  filter(sector != "NA")
## filter: no rows removed
server <- function(input, output, session) {
  
  #Summarize Data and then Plot
  data1 <- reactive({
    req(input$subject)
    req(input$type)
    df <- if(input$type == "Absolute"){
      dat1 %>% 
      filter(subject %in% input$subject) %>%
      filter(yahal == 5) %>% 
      group_by(sector, grad_year) %>%
      summarise(total = sum(students))
    }
    else if(input$type == "Percentage"){
      dat1 %>% 
      filter(subject %in% input$subject) %>%
      group_by(sector, grad_year) %>%
      summarise(total = sum(students[yahal == 5]) / sum(students))
    }
})

    #Plot 
  output$plot <- renderPlotly({
    ggplotly(
      ggplot(data1(),
             aes(x = grad_year, y = total, colour = sector))+
      geom_line(size = 1.5) +
      scale_colour_manual(values=c("יהודי" = "#0d0887",
                               "דתי" = "#6a00a8",
                               "חרדי" = "#b12a90",
                               "דרוזי" = "#e16462",
                               "בדואי" = "#fca636",
                               "ערבי" = "#f0f921"))+
      theme_minimal()+
      scale_x_continuous(breaks=c(2013, 2014, 2015, 2016, 2017, 2018, 2019))+
      labs(y = "Number / Percentage of Students in 5 Yahal",
           x = "Year of graduation")+
        theme(legend.title=element_blank(),
              plot.margin=unit(c(1,3,1,1),"cm")),
      tooltip = c("x", "y")) %>% 
      layout(yaxis = list(fixedrange = TRUE),
             xaxis = list(fixedrange = TRUE),
             margin = list(r = 10, l = 20)) %>% 
      config(displayModeBar = FALSE)
  })
}

ui <- basicPage(
  h1("Number of students in 5 Yahal - math/english"),
  selectInput(inputId = "subject",
              label = "Choose subject",
              list("Math", "English")),
  selectInput(inputId = "type",
              label = "Choose Information Type",
              list("Absolute","Percentage")),
  mainPanel(plotlyOutput("plot")),
  h6("* For anonymization purposes, questionnaires with less than 11 students at the school do not appear in the data. Both the sum of students and percentage of students are callculated for questionnaires that 11 or more students took in a specific school"))

shinyApp(ui, server)
## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.

Shiny applications not supported in static R Markdown documents

# Map --------------------------------------------------------------------------

dat2 <- dat %>%
  filter(sector != "NA",
         grad_year == 2019,
         subject == "מתמטיקה") %>%
  group_by(school_id, school_name, school_x, school_y) %>%
  summarise(school_mean = mean(grades)) %>%
  ungroup(school_id, school_name, school_x, school_y) %>%
  mutate(mean_all = mean(school_mean)) %>%
  mutate(dev = scale(school_mean))
## filter: removed 78,564 rows (98%), 1,603 rows remaining
## group_by: 4 grouping variables (school_id, school_name, school_x, school_y)
## summarise: now 763 rows and 5 columns, 3 group variables remaining (school_id, school_name, school_x)
## ungroup: no grouping variables
## mutate: new variable 'mean_all' (double) with one unique value and 0% NA
## mutate: new variable 'dev' (double) with 720 unique values and 0% NA
dat2$rank <- ifelse(dat2$dev > 1.5, "Higest grades", NA)
dat2$rank <- ifelse(dat2$dev > 0.5 & dat2$dev <= 1.5, "Grades above avg", dat2$rank)
dat2$rank <- ifelse(dat2$dev < -0.5 & dat2$dev >= -1.5,"Grades below avg", dat2$rank)
dat2$rank <- ifelse(dat2$dev < -1.5, "Lowest grades", dat2$rank)

dat2 <- na.omit(dat2)
dat2$rank <- factor(dat2$rank,levels = c("Higest grades",
                                         "Grades above avg",
                                         "Grades below avg",
                                         "Lowest grades" ) )

##### Map

israel <- readOGR(
  dsn= getwd() ,
)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\אביב\Google Drive\Study\ויזואליזציה של מידע\Data_Vizualization", layer: "gadm36_ISR_1_area"
## with 7 features
## It has 1 fields
is <- tidy(israel)
## Regions defined for each Polygons
westbank <- readRDS("rds.rds", refhook = NULL)
wb <- tidy(westbank)
## Regions defined for each Polygons
wb$id <- 8
wb$group <- as.factor(7.1)
class(is$group)
## [1] "factor"
SP <- rbind(is, wb)
SP$id[SP$id == 0] <- 4


map <- ggplot(SP, aes(long, lat, group = group)) +
  geom_polygon(alpha = 0.5) +
  coord_equal() +
  geom_point(data = dat2, aes(x = school_x, y = school_y,color = rank,
                            text = paste0(school_name,
                                          "\nMean Grade: ", round(school_mean,2))),
             inherit.aes = F, alpha = 0.7, size = 0.9) +
  scale_colour_manual(values=c("Higest grades" = "green",
                               "Grades above avg" = "green4",
                               "Grades below avg" = "firebrick4",
                               "Lowest grades" = "firebrick1"))+
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        panel.background = element_blank(),axis.line = element_blank(),
        axis.ticks.x = element_blank(), axis.text.x = element_blank(),
        axis.ticks.y = element_blank(), axis.text.y = element_blank(),
        axis.title.x = element_blank(), axis.title.y = element_blank(),
        plot.title = element_text(hjust = 0.5))+
  labs(title = "Map of schools in Israel by Math grades in 2019")+
  annotate("text", x = 36.15000, y = 30.90000, label = paste0("Country Mean", round(dat2$mean_all,2)))
## Warning: Ignoring unknown aesthetics: text
ggplotly(map, tooltip = "text") %>%
  layout(legend = list( x = 0.1, y = 0.9))
# Top-Bottom Citizenship grades ------------------------------------------------

dat3 <- dat %>%
  filter(grad_year == 2019,
         subject == "אזרחות",
         yahal == 2) %>%
  group_by(school_id) %>%
  mutate(mean_school = mean(grades)) %>%
  ungroup() %>%
  mutate(mean_all = mean(grades),
         dev = mean_school-mean_all) %>%
  arrange(desc(mean_school)) %>%
  filter(row_number() == 1:5 | row_number() > n()-5) %>%
  mutate(type = case_when(dev > 0 ~ "Best",
                        TRUE ~ "Worst"))
## filter: removed 79,354 rows (99%), 813 rows remaining
## group_by: one grouping variable (school_id)
## mutate (grouped): new variable 'mean_school' (double) with 701 unique values and 0% NA
## ungroup: no grouping variables
## mutate: new variable 'mean_all' (double) with one unique value and 0% NA
##         new variable 'dev' (double) with 701 unique values and 0% NA
## Warning in row_number() == 1:5: longer object length is not a multiple of
## shorter object length
## filter: removed 803 rows (99%), 10 rows remaining
## mutate: new variable 'type' (character) with 2 unique values and 0% NA
ggplotly(
      ggplot(dat3,
        aes(x = reorder(school_name, dev), fill = sector, weight = dev, text = paste0("School Average: ", mean_school))) +
        geom_bar() +
        scale_fill_manual(values=c("יהודי" = "#0d0887",
                               "דתי" = "#6a00a8",
                               "חרדי" = "#b12a90",
                               "ערבי" = "#f0f921")) +
        labs(x = "School Name", y = "Deviation from Country Average") +
        coord_flip() +
        theme_minimal() +
        geom_hline(yintercept = 0, size = 2, color = "red") +
        annotate("text", x = 9.5, y = -10,
          label = paste0("Country Average:", round(dat3$mean_all[1],2)), color = "red"),
      tooltip = "text") %>%
      layout(yaxis = list(fixedrange = TRUE),
             xaxis = list(fixedrange = TRUE)) %>% 
      config(displayModeBar = FALSE)
# Sector/District Per-unit -----------------------------------------------------

dat4 <- dat
dat4$subject[dat4$subject == "מתמטיקה"] <- "Math"
dat4$subject[dat4$subject == "אנגלית"] <- "English"
x <- which(dat4$subject=="Math")
y <- which(dat4$subject=="English")
c <- c(x,y)
dat4 <- dat4[c,]
dat4$subject <- as.factor(dat4$subject)

dat4 <- dat4 %>%
  filter(sector != "NA")
## filter: no rows removed
server <- function(input, output, session) {
  
  data4 <- reactive({
    req(input$subject)
    req(input$comparison)
    req(input$yahal)
    df <- if(input$comparison == "sector"){
      dat4 %>% 
      filter(subject %in% input$subject, yahal %in% input$yahal) %>%
      group_by(sector, grad_year) %>%
      summarise(students = mean(grades)) %>% 
      rename(comp = 1)
    }
    else if(input$comparison == "district"){
      dat4 %>% 
      filter(subject %in% input$subject, yahal %in% input$yahal) %>%
      group_by(district, grad_year) %>%
      summarise(students = mean(grades)) %>% 
      rename(comp = 1)
    }
    })

    #Plot 
  output$plot <- renderPlotly({
    ggplotly(
      ggplot(data4(),
             aes(x = grad_year, y = students, group = comp, color = comp,
                 text = paste0(comp,
                               "\nGraduation Year: ", grad_year,
                               "\nMean Grade: ", round(students,2))))+
        scale_x_continuous(breaks=c(2013, 2014, 2015, 2016, 2017, 2018, 2019))+
      geom_line(size = 1.5) +
      scale_colour_manual(values=c("יהודי" = "#0d0887",
                               "דתי" = "#6a00a8",
                               "חרדי" = "#b12a90",
                               "דרוזי" = "#e16462",
                               "בדואי" = "#fca636",
                               "ערבי" = "#f0f921",
                               "צפון" = "#1a9850",
                               "חיפה" = "#91cf60",
                               "תל אביב" = "#d9ef8b",
                               "מרכז" = "#fee08b",
                               "ירושלים" = "#fc8d59",
                               "דרום" = "#d73027"))+
      theme_minimal()+
      theme(legend.title=element_blank())+
      labs(y = "Mean Grade",
           x = "Year of graduation"),
      tooltip = c("text")) %>% 
      layout(yaxis = list(fixedrange = TRUE),
             xaxis = list(fixedrange = TRUE)) %>% 
      config(displayModeBar = FALSE)
  })
}

ui <- basicPage(
  h1("Mean Grades pre Sector - math/english"),
  selectInput(inputId = "subject",
              label = "Choose subject",
              list("Math", "English")),
  selectInput(inputId = "comparison",
              label = "Choose comparison variable",
              list("district", "sector")),
  selectInput(inputId = "yahal",
              label = "Choose number of yahal",
              list("3", "4", "5")),
    mainPanel(plotlyOutput("plot")))
  

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

# First Lenguage - English -----------------------------------------------------

dat5 <- dat %>% 
  select(c('school_id', 'yahal', 'grad_year', 'subject', 'grades', 'sector'))
## select: dropped 9 variables (City, students, school_name, district, supervision, …)
dat5$subject[dat5$subject == "אנגלית"] <- "English"
dat5$subject[dat5$subject == "הבעה עברית"] <- "Hebrew"
dat5$subject[dat5$subject == "ערבית לערבים"] <- "Arabic"
dat5$subject[dat5$subject == "ערבית לדרוזים"] <- "Arabic"

english_dat <- dat5[which(dat5$subject=="English"),]
colnames(english_dat)[4] <- "english"
colnames(english_dat)[5] <- "english_grades"

hebrew_dat <- dat5[which(dat5$subject=="Hebrew"),]
colnames(hebrew_dat)[4] <- "first_language"
colnames(hebrew_dat)[5] <- "first_language_grade"

arabic_dat <- dat5[which(dat5$subject=="Arabic"),]
colnames(arabic_dat)[4] <- "first_language"
colnames(arabic_dat)[5] <- "first_language_grade"

dat5 <- rbind(hebrew_dat, arabic_dat)
dat5 <- merge(x = dat5, y = english_dat,
              by = c("school_id", "sector", "grad_year"), 
              all.x = FALSE, all.y = TRUE)
dat5 <- na.omit(dat5)


server <- function(input, output, session) {
  
  data5 <- reactive({
    req(input$yahal)
    df <- dat5 %>% 
      filter(yahal.y %in% input$yahal)
    })

    #Plot 
  output$plot <- renderPlotly({
    ggplotly(
      ggplot(data5(),
             aes(x = first_language_grade, y = english_grades, color = first_language))+
        geom_point(alpha = 0.5)+
        stat_cor(aes(label = paste("r = ", ..r.., sep = " ")), label.x = 50, label.y = c(98,95))+
        geom_smooth(method = lm)+
        scale_color_manual(values=c("#fcd225", "#0d0887"))+
      theme_minimal()+
      labs(x = "First Language Mean Grades",
           y = "English Mean Grades")+
        theme(plot.margin=unit(c(1,1,1,1),"cm")),
      tooltip = c("x", "y")) %>% 
      config(displayModeBar = FALSE)
  })
}

ui <- basicPage(
  h1("Correlation between abilities in a first language and in a second language"),
  selectInput(inputId = "yahal",
              label = "Choose how many English yahal",
              list("3", "4", "5")),
    mainPanel(plotlyOutput("plot")))
  

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

# Hebrew - History/Literature correlation --------------------------------------

dat6 <- dat
dat6$subject[dat6$subject == "הבעה עברית"] <- "Hebrew"
dat6$subject[dat6$subject == "הסטוריה"] <- "History"
dat6$subject[dat6$subject == "ספרות"] <- "Literature"
dat6 <- dat6[dat6$subject == "Hebrew"|
             dat6$subject == "History"|
             dat6$subject == "Literature",]

dat6 <- dat6 %>% 
  filter(yahal == 2) %>% 
  select(c('school_id', 'grad_year', 'subject', 'grades'))
## filter: removed 834 rows (7%), 10,805 rows remaining
## select: dropped 11 variables (City, students, yahal, school_name, district, …)
dat6 <- reshape(dat6, idvar = c("school_id", "grad_year"),
                timevar = "subject", direction = "wide")

dat6 <- na.omit(dat6)
colnames(dat6)[3] <- "Hebrew"
colnames(dat6)[4] <- "History"
colnames(dat6)[5] <- "Literature"


server <- function(input, output, session) {
  
  data6 <- reactive({
    req(input$subject)
    req(input$Year)
    df <- if(input$subject == "History"){
      dat6 %>% 
      filter(grad_year %in% input$Year) %>%
      select(c('school_id', 'Hebrew', 'History')) %>%
      rename(Grades = 3)
    }
    else if(input$subject == "Literature"){
      dat6 %>% 
      filter(grad_year %in% input$Year) %>% 
      select(c('school_id', 'Hebrew', 'Literature')) %>% 
      rename(Grades = 3)
    }
    })
  
    #Plot 
  output$plot <- renderPlotly({
    ggplotly(
      ggplot(data6(),
             aes(x = Hebrew, y = Grades))+
        geom_point(alpha = 0.7)+
        stat_cor(aes(label = paste("r = ", ..r.., sep = " ")), label.x = 50, label.y = 82)+
        geom_smooth(method = lm)+
        scale_colour_viridis_d(option = "plasma", direction = 1)+
      theme_minimal()+
      labs(x = "First Language Mean Grades",
           y = "Language-Rich Subjects (History / Literature)")+
        theme(plot.margin=unit(c(1,1,2,1),"cm")),
      tooltip = c("x", "y")) %>%
      config(displayModeBar = FALSE)
  })
}

ui <- basicPage(
  h1("Correlation between Hebrew for jewish and History/Literature"),
  selectInput(inputId = "Year",
              label = "Choose a graduation year",
              list("2013", "2014", "2015", "2016", "2017", "2018", "2019")),
    selectInput(inputId = "subject",
              label = "Choose subject",
              list("History", "Literature")),
    mainPanel(plotlyOutput("plot")))


shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

# Subject correlations ---------------------------------------------------------

dat7 <- dat
dat7$subject[dat7$subject == "הבעה עברית"] <- "First language"
dat7$subject[dat7$subject == "ערבית לערבים"] <- "First language"
dat7$subject[dat7$subject == "ערבית לדרוזים"] <- "First language"
dat7$subject[dat7$subject == "הסטוריה"] <- "History"
dat7$subject[dat7$subject == "הסטוריה לבי'ס דרוזי"] <- "History"
dat7$subject[dat7$subject == "הסטוריה לבי'ס ערבי"] <- "History"
dat7$subject[dat7$subject == "ספרות"] <- "Literature"
dat7$subject[dat7$subject == "אנגלית"] <- "English"
dat7$subject[dat7$subject == "מתמטיקה"] <- "Math"
dat7$subject[dat7$subject == "אזרחות"] <- "Citizenship"
dat7$subject[dat7$subject == "חנוך גופני"] <- "Sport"

sub <- c("Math","English","Literature","History","First language","Citizenship", "Sport")

dat7 <- dat7 %>% 
  filter(subject %in% sub) %>%
  group_by(school_id, subject) %>%
  get_summary_stats(grades, type = "mean") %>%
    select(c('school_id', 'subject', 'mean'))
## filter: removed 39,350 rows (49%), 40,817 rows remaining
## group_by: 2 grouping variables (school_id, subject)
## select: dropped 2 variables (variable, n)
dat7$subject <- as.factor(dat7$subject)

heat_dat <- pivot_wider(dat7, names_from = "subject", 
                      values_from = "mean")
## pivot_wider: reorganized (subject, mean) into (Citizenship, English, First language, History, Literature, …) [was 4878x3, now 884x8]
heat_dat <- heat_dat %>%
  na.omit(dat7) %>%
  select(-school_id)
## select: dropped one variable (school_id)
cormat <- cor(heat_dat)


ggplotly(
      ggcorrplot(cormat) + 
  scale_fill_gradient(limit = c(0,1), low = "yellow", high =  "blue"),
      tooltip = c("value")) %>% 
      config(displayModeBar = FALSE)
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.